Goto

Collaborating Authors

 out-of-sample prediction



Parsimonious Bayesian deep networks

Mingyuan Zhou

Neural Information Processing Systems

Rather than making an uneasy choice in the first place between a linear classifier, which has fast computation and resists overfitting but may not provide sufficient class separation, and an over-capacitized model, which often wastes computation and requires careful regularization to prevent overfitting, we propose a parsimonious Bayesian deep network (PBDN) that builds its capacity regularization into the greedy-layer-wise construction and training of the deep network.


2dffbc474aa176b6dc957938c15d0c8b-Reviews.html

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper presents a Bayesian approach to state and parameter estimation in nonlinear state-space models, while also learning the transition dynamics through the use of a Gaussian process (GP) prior. The inference mechanism is based on particle Markov chain Monte Carlo (PMCMC) with the recently-introduced idea of ancestor sampling. The paper also discusses computational efficiencies to be had with respect to sparsity and low-rank Cholesky updates. This is a technically sound and strong paper with clear and accessible presentation.


Reviews: Parsimonious Bayesian deep networks

Neural Information Processing Systems

The paper introduces a new type of (deep) neural network for binary classification. Each layer is in principle infinitely wide but in practice finite number of units is used. The layers are trained sequentially by first training one layer, and then always the next layer after the previous one. The main claim is that the proposed model gives comparable results to the alternative approaches by utilizing fewer hyperplanes that results in faster out-of-sample predictions. The approach seems somewhat novel and the results support the claim to some extent.


Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs

Curth, Alicia

arXiv.org Machine Learning

The sudden appearance of modern machine learning (ML) phenomena like double descent and benign overfitting may leave many classically trained statisticians feeling uneasy -- these phenomena appear to go against the very core of statistical intuitions conveyed in any introductory class on learning from data. The historical lack of earlier observation of such phenomena is usually attributed to today's reliance on more complex ML methods, overparameterization, interpolation and/or higher data dimensionality. In this note, we show that there is another reason why we observe behaviors today that appear at odds with intuitions taught in classical statistics textbooks, which is much simpler to understand yet rarely discussed explicitly. In particular, many intuitions originate in fixed design settings, in which in-sample prediction error (under resampling of noisy outcomes) is of interest, while modern ML evaluates its predictions in terms of generalization error, i.e. out-of-sample prediction error in random designs. Here, we highlight that this simple move from fixed to random designs has (perhaps surprisingly) far-reaching consequences on textbook intuitions relating to the bias-variance tradeoff, and comment on the resulting (im)possibility of observing double descent and benign overfitting in fixed versus random designs.


Robust Inference of Dynamic Covariance Using Wishart Processes and Sequential Monte Carlo

Huijsdens, Hester, Leeftink, David, Geerligs, Linda, Hinne, Max

arXiv.org Machine Learning

A Bayesian nonparametric model known as the Wishart process has been shown to be effective in this situation, but its inference remains highly challenging. In this work, we introduce a Sequential Monte Carlo (SMC) sampler for the Wishart process, and show how it compares to conventional inference approaches, namely MCMC and variational inference. Using simulations we show that SMC sampling results in the most robust estimates and out-of-sample predictions of dynamic covariance. SMC especially outperforms the alternative approaches when using composite covariance functions with correlated parameters. We demonstrate the practical applicability of our proposed approach on a dataset of clinical depression (n = 1), and show how using an accurate representation of the posterior distribution can be used to test for dynamics on covariance.


Predict the Future from the Past? On the Temporal Data Distribution Shift in Financial Sentiment Classifications

Guo, Yue, Hu, Chenxi, Yang, Yi

arXiv.org Artificial Intelligence

Temporal data distribution shift is prevalent in the financial text. How can a financial sentiment analysis system be trained in a volatile market environment that can accurately infer sentiment and be robust to temporal data distribution shifts? In this paper, we conduct an empirical study on the financial sentiment analysis system under temporal data distribution shifts using a real-world financial social media dataset that spans three years. We find that the fine-tuned models suffer from general performance degradation in the presence of temporal distribution shifts. Furthermore, motivated by the unique temporal nature of the financial text, we propose a novel method that combines out-of-distribution detection with time series modeling for temporal financial sentiment analysis. Experimental results show that the proposed method enhances the model's capability to adapt to evolving temporal shifts in a volatile financial market.


Modeling Short Time Series with Prior Knowledge in PyMC - Dr. Juan Camilo Orduz

#artificialintelligence

The mean \(\mu_t\) of such distribution is modeled using three components: seasonality (\(\lambda_t\)), an autoregressive term on the latent mean (\(\mu_{t - 1}\)) and an autoregressive sales model. The seasonality component includes a linear trend, in-week seasonality via day of week indicator functions and long term seasonality modeled using Fourier modes. The key point to note is that the prior of such Fourier modes are actually determined by the posterior distribution obtained from the temperature model. Now we write the model above in PyMC. As always, is always good to run prior predictive checks before fitting the model.


Population modeling with machine learning can enhance measures of mental health

#artificialintelligence

Figure 1 – Figure supplement 1: Learning curves on the random split-half validation used for model building. To facilitate comparisons, we evaluated predictions of age, fluid intelligence and neuroticism from a complete set of socio-demographic variables without brain imaging using the coefficient of determination R2 metric (y-axis) to compare results obtained from 100 to 3000 training samples (x-axis). The cross-validation (CV) distribution was obtained from 100 Monte Carlo splits. Across targets, performance started to plateau after around 1000 training samples with scores virtually identical to the final model used in subsequent analyses. These benchmarks suggest that inclusion of additional training samples would not have led to substantial improvements in performance.


Stock2Vec: A Hybrid Deep Learning Framework for Stock Market Prediction with Representation Learning and Temporal Convolutional Network

Wang, Xing, Wang, Yijun, Weng, Bin, Vinel, Aleksandr

arXiv.org Machine Learning

We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market. With representation learning, we derived an embedding called Stock2Vec, which gives us insight for the relationship among different stocks, while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series. Evaluated on S&P 500, our hybrid framework integrates both advantages and achieves better performance on the stock price prediction task than several popular benchmarked models.